[pull] main from llvm:main #5546

pull · 2025-01-16T01:14:23Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

See the attached test case.

PR #112540 as the reference.

…#124754) Enable device code ASan instrumentation for openmp offload applications using option '-fsanitize=address'.

…124247) The profiling of the LLVM Test-suite reveals that a significant portion, specifically 14,090 out of 139,323, loop nests were identified as non-viable candidates for transformation, leading to the transform exiting from isComputableLoopNest() without any action. More importantly, dependence information was computed for these loop nests before reaching the function isComputableLoopNest(), which does not require DI and relies solely on scalar evolution (SE). To enhance compile-time efficiency, this patch moves the call to isComputableLoopNest() earlier in the control-flow, thereby avoiding unnecessary dependence calculations. The impact of this change is evident on the compile-time-tracker, with the overall geometric mean improvement recorded at 0.11%, while the lencode benchmark gets a more substantial benefit of 0.44%. This improvement can be tracked in the isc-ln-exp-2 branch under my repo.

…turns (#125280) This patch does two things. 1. Previously, when checking driver arguments, we emitted an error for unsupported values of `-mbranch-protection` when using pauthtest ABI. The reason for that was ptrauth-returns being enabled as part of pauthtest. This patch changes the check against pauthtest to a check against ptrauth-returns. 2. Similarly, check against values of the following function attribute which are unsupported with ptrauth-returns: `__attribute__((target("branch-protection=XXX`. Note that existing `validateBranchProtection` function is used, and current behavior is to ignore the unsupported attribute value, so no error is emitted.

…fixes. Re-enables compact-unwind support in JITLink, which was reverted in b04847b due to buildbot failures. The underlying cause for the failures on the buildbots was the lack of compact-unwind registration support on older Darwin OSes. Since the CompactUnwindManager pass now removes eh-frames by default we were left with unwind-info that could not be registered. On x86-64, where eh-frame info is produced by default the solution is to fall back to using eh-frames. On arm64 we simply can't support exceptions on older OSes. This patch updates the EHFrameRegistrationPlugin to remove the compact-unwind section (__LD,__compact_unwind) when installed, forcing use of eh-frames when the EHFrameRegistrationPlugin is used. In LLJIT, the EHFrameRegistrationPlugin continues to be used for all non-Darwin platform, and will be added on Darwin platforms when the a CompactUnwindRegistrationPlugin instance can't be created (e.g. due to missing support for compact-unwind info registration). The lit.cfg.py script is updated to check whether the host OSes default unwind info supports JIT registration, allowing tests to be disabled for older Darwin OSes on arm64.

This library is provided by flang, not MLIR, so it should not be part of MLIR_LIBS. Fixes an issue introduced in #120966.

For convenience this patch drops nsw for `sub`. It also allows this fold with `ctlz_zero_undef`. Alive2: https://alive2.llvm.org/ce/z/VmvqSt

This is a test library which is not part of libMLIR, so it should use normal LINK_LIBS instead of mlir_target_link_libraries. This fixes an issue introduced in #123910 and follows up on the fix in #125004, which added the library to DEPENDS, which is not sufficient.

Changes: 1. Fix inconsistencies in register pressure set printing. "Max Pressure" printing is inconsistent with "Bottom Pressure" and "Top Pressure". For the former, register class begins on the same line vs newline for latter. Also for the former, the first register class is on the same line, but subsequent register classes are newline separated. That's removed so all are on the same line. Before: Max Pressure: FPR8=1 GPR32=14 Top Pressure: GPR32=2 Bottom Pressure: FPR8=7 GPR32=17 After: Max Pressure: FPR8=1 GPR32=14 Top Pressure: GPR32=2 Bottom Pressure: FPR8=7 GPR32=17 2. After scheduling an instruction, don't print pressure diff if there isn't one. Also s/UpdateRegP/UpdateRegPressure. E.g., Before: UpdateRegP: SU(3) %0:gpr64common = ADDXrr %58:gpr64common, gpr64 to UpdateRegP: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 390, 12 to GPR32 -1 After: UpdateRegPressure: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 12 to GPR32 -1 3. Don't print excess pressure sets if there are none.

Previously the grammar tokens SimpleValue2 through SimpleValue9 were unreferenced. This ties them together so that the grammar makes more sense.

…125819) Closes #125438

This commit moves the rotate builtin to the CLC library. It also optimizes rotate(x, n) to generate the @llvm.fshl(x, x, n) intrinsic, for both scalar and vector types. The previous implementation was too cautious in its handling of the shift amount; the OpenCL rules state that the shift amount is always treated as an unsigned value modulo the bitwidth.

PR #124961 adds intrinsics for the tcgen05 alloc/dealloc PTX instructions. This patch adds NVVM Ops for the same. Tests are added to verify the lowering to the corresponding intrinsics in tcgen05-alloc.mlir file. PTX ISA link: https://docs.nvidia.com/cuda/parallel-thread-execution/#tcgen05-memory-alloc-manage-instructions Signed-off-by: Durgadoss R <[email protected]>

The Fortran libraries are not part of MLIR, so they should use target_link_libraries() rather than mlir_target_link_libraries(). This fixes an issue introduced in #120966.

These were referring to nonexistent grammar tokens instead of `Value`.

For consistency with input def handling.

Removed the TOSA quantization attribute used in various MLIR TOSA dialect operations in favour of using builtin attributes. Update any lit tests, conversions and transformations appropriately. Signed-off-by: Tai Ly <[email protected]> Co-authored-by: Tai Ly <[email protected]>

This PR moves maximum number of threads in a block and block in a grid to nvgpu dialect to avoid replicated code. The limits are defined here: https://docs.nvidia.com/cuda/cuda-c-programming-guide/#features-and-technical-specifications-technical-specifications-per-compute-capability

Make lifetime management more explicit. We're only using this for CXXPseudoDestructorExprs for now but we need this to handle std::construct_at/placement-new after destructor calls later anyway.

When building mlir with `-DMLIR_NVVM_EMBED_LIBDEVICE=ON`, there will be a warning ``` build/tools/mlir/lib/Target/LLVM/libdevice_embedded.c:1: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘143’ to ‘-113’ [-Woverflow] ``` which is followed by a large number of characters in stdout. Fix this to avoid stdout outputting a large number of characters (3e5).

The previous implementation had false positive/negative cases in the analysis of the loop carried dependency. A missed dependency case is caused by incorrect analysis of address increments. This is fixed by strict analysis of recursive definitions. See added test swp-carried-dep4.mir. Excessive dependency detection is fixed by improving the formula for determining the overlap of address ranges to be accessed. See added test swp-carried-dep5.mir.

If the input contains odd number of shuffled vectors, the 2 last shuffles are shuffled with the same first vector. Need to correctly process such situation: when the first vector is requested for the first time - extract it from the source vector, when it is requested the second time - reuse previous result. The second vector should be extracted in both cases. Fixes #125269 Reviewers: topperc, preames Reviewed By: preames Pull Request: #125693

There were a bunch of spots in ROCDL.td where we were defining our own llvmBuilder call which could have been generated using the default built-in one on LLVM_IntrOpBase. This commit cleans up such usages in the interests of potentinally enabling ROCDL import in the future and of making best practices more obvious. The one breaking change is renaming WaitcntOp to SWaitcntOp, which should have minimal impact.

…lled from `.preinit_array`. (#113427) (#116121) Signed-off-by: funsafe-ptr <[email protected]>

…Cost with the input operand type, not the result Typo in #121216 Fixes #126085

Fix for lit fail from #125737 Signed-off-by: Sarnie, Nick <[email protected]>

To establish a baseline for new tests mentioned in #117785, adding them here independently.

threadset clause is part of free-agent threads. But, free-agent threads also involves runtime implementation. So, added an separate entry for threadset clause and claimed it for clang. --------- Co-authored-by: Sunil Kuravinakop <[email protected]>

If BOLT fails to locate DWO CU when using split DWARF, this signifies an issue with the input (missing .dwo) rather than an internal assertion.

…-V (#125905) This is a follow-up to 5df62bd. That commit should not have needed to make the vector.insert and vector.extract conversions to SPIR-V directly handle the static poison index case, as there is a fold from those to ub.poison, and a conversion pattern from ub.poison to spirv.Undef, however: - The ub.poison fold result could not be materialized by the vector dialect (fixed as of d13940e). - The conversion pattern wasn't being populated in VectorToSPIRVPass, which is used by the tests. This commit changes this. - The ub.poison to spirv.Undef pattern rejected non-scalar types, which prevented its use for vector results. It is unclear why this restriction existed; a remark in D156163 said this was to avoid converting "user types", but it is not obvious why these shouldn't be permitted (the SPIR-V specification allows OpUndef for all types except OpTypeVoid). This commit removes this restriction. With these fixed, this commit removes the redundant static poison index handling, and updates the tests.

Summary: This probably wasn't the intended result, but the code here causes OpenMP to always link in `ockl.bc` which was intentionally not linked. This results in the OCKL definitions conflicting with the OpenMP ones and also prevents them from being optimized out (Might be fixed with newer ROCm that actually builds the visibility correctly). I'm pretty sure the only reason this didn't break the tests is because we're smart and pass `-nogpulib` there to keep the environment from being poisoned with stuff like this.

…125732) Reapplies #122471 This is based on #125699, only the latest commit is relevant. With changes in this PR and the parent one, the previously reported failures in the Fujitsu(*) test suite should hopefully be resolved (I verified all the 14 reported failures and they pass now). (*) https://linaro.atlassian.net/browse/LLVM-1521

Use CmpPredicate::getMatching in isImpliedCondBalancedTypes to pass samesign information to isImpliedViaOperations, and teach it to call CmpPredicate::getPreferredSignedPredicate, effectively making it optimize with samesign information.

…r uses (#124327) Remove the restriction that scheduling rematerialization candidates cannot have virtual reg uses. Currently, this only allows for virtual reg uses which are already live at the rematerialization point, so bring in allUsesAvailableAt to check for this condition. Because of this condition, the uses of the remats will already be live in to the region, so the remat won't increase live-in pressure. Add an expensive check to check this condition.

These aliases are never used, so we can ditch them.

…ing definition (#126061) Having them defined ouf-of-line results in a significant amount of boilerplate without improving readability, since they're just one or two lines long anyways. As a drive-by, add comments between the declarations to make them easier to distinguish.

…(y - z) + z even with Zicond. (#125772)" With the test changes. Original message: The Zicond version of this requires an li instruction and an additional register. Without Zicond we match this in a DAGCombine on RISCVISD::SELECT_CC. This PR has 2 commits. I'll pre-commit the test change if this looks good.

- Use `Emitter::OptClass` to invoke `InstrInfoEmitter::run` and eliminate the `EmitInstrInfo` function.

Since line zero means "no line information", when symbolizing a location (an address or an inline frame associated with the address) that has a line zero location, we shouldn't include other irrelevant data (like filename) in the result.

#125923) NaryReassociate would crash on expressions like the one in the added test that involved pointers where the size of the type was greater than the index width of the pointer, causing calls to SCEV's zext expression on types that didn't need to be zero-extended. This commit fixes the issue.

…25629) LLVM itself is generally moving away from using `undef` and towards using `poison`, to the point of having a lint that caches new uses of `undef` in tests. In order to not trip the lint on new patterns and to conform to the evolution of LLVM - Rename valious ::undef() methods on StructBuilder subclasses to ::poison() - Audit the uses of UndefOp in the MLIR libraries and replace almost all of them with PoisonOp The remaining uses of `undef` are initializing `uninitialized` memrefs, explicit conversions to undef from SPIR-V, and a few cases in AMDGPUToROCDL where usage like %v = insertelement <M x iN> undef, iN %v, i32 0 %arg = bitcast <M x iN> %v to i(M * N) is used to handle "i32" arguments that are are really packed vectors of smaller types that won't always be fully initialized.

…dSkipZero is not set (#126044) I ran into this while working on a different patch where I'm emitting a zero-valued DWARF enum field which shouldn't be skipped. This patch checks the (currently unused) `ShouldSkipZero` before deciding to skip printing this field. Based on git history this seems like an oversight from the initial refactor that introduced this. We have a similar check in `printInt`. Wasn't sure how to best test this, but tests in an upcoming patch rely on this functionality (see #126045). Currently the only place `ShouldSkipZero` is set to `false` is when emitting the `DW_LANG_` enum. But the language codes start at `0x1`. So it never exercised this codepath (and we should probably just make it not pass this parameter).

#126104) The fix requires more investigation, and it's a test issue so reverting the product changes should not be necessary. Signed-off-by: Sarnie, Nick <[email protected]>

) We test the `--repository` output in the unit tests, but that option fails to change the HTML output in the end-to-end tests. Upcoming patches will address the incorrect behavior.

…#122275) Goals: 1. To add syntax and semantic to 'batch_matmul' without changing any of the existing syntax expectations for current usage. batch_matmul is still just batch_matmul. 2. Move the definition of batch_matmul from linalg OpDsl to tablegen ODS infra. Scope of this patch: To expose broadcast and transpose semantics on the 'batch_matmul'. The broadcast and transpose semantic are as follows: By default, 'linalg.batch_matmul' behavior will remain as is. Broadcast and Transpose semantics can be applied by specifying the explicit attribute 'indexing_maps' as shown below. This is a list attribute, so the list must include all the maps if specified. Example Transpose: ``` linalg.batch_matmul indexing_maps = [ affine_map< (d0, d1, d2, d3) -> (d0, d3, d1)>, //transpose affine_map< (d0, d1, d2, d3) -> (d0, d3, d2)>, affine_map< (d0, d1, d2, d3) -> (d0, d1, d2)> ] ins (%arg0, %arg1: memref<2x5x3xf32>,memref<2x5x7xf32>) outs (%arg2: memref<2x3x7xf32>) ``` Example Broadcast: ``` linalg.batch_matmul indexing_maps = [ affine_map< (d0, d1, d2, d3) -> (d3)>, //broadcast affine_map< (d0, d1, d2, d3) -> (d0, d3, d2)>, affine_map< (d0, d1, d2, d3) -> (d0, d1, d2)> ] ins (%arg0, %arg1: memref<5xf32>,memref<2x5x7xf32>) outs (%arg2: memref<2x3x7xf32>) ``` Example Broadcast and transpose: ``` linalg.batch_matmul indexing_maps = [ affine_map< (d0, d1, d2, d3) -> (d1, d3)>, //broadcast affine_map< (d0, d1, d2, d3) -> (d0, d2, d3)>, //transpose affine_map< (d0, d1, d2, d3) -> (d0, d1, d2)> ] ins (%arg0, %arg1: memref<3x5xf32>, memref<2x7x5xf32>) outs (%arg2: memref<2x3x7xf32>) ``` RFCs and related PR: https://discourse.llvm.org/t/rfc-linalg-opdsl-constant-list-attribute-definition/80149 https://discourse.llvm.org/t/rfc-op-explosion-in-linalg/82863 https://discourse.llvm.org/t/rfc-mlir-linalg-operation-tree/83586 #115319

pull bot added the ⤵️ pull label Jan 16, 2025

tbaederr and others added 29 commits February 5, 2025 08:04

[clang][bytecode] Print desriptor PrimType in dump() (#125726)

e90f9b4

[clang][bytecode] Destroy local variables in reverse order (#125727)

16c721f

See the attached test case.

CodeGen][NewPM] Port PostRAScheduler to NPM. (#125798)

44f638f

[ARM] Add "avoidmuls" to STAR-MC1 also (#123706)

409fa78

PR #112540 as the reference.

[OpenMP][ASan] Enable ASan Instrumentation for AMDGPUOpenMPToolChain. (…

646d352

…#124754) Enable device code ASan instrumentation for openmp offload applications using option '-fsanitize=address'.

[lldb] Document lldb x packet deprecation. (#125682)

ad38c4c

[flang] Move FIRSupport dependency to correct place (#125697)

ee76bda

This library is provided by flang, not MLIR, so it should not be part of MLIR_LIBS. Fixes an issue introduced in #120966.

[InstCombine] Drop nowrap flags in foldBitCeil (#125817)

922ab66

For convenience this patch drops nsw for `sub`. It also allows this fold with `ctlz_zero_undef`. Alive2: https://alive2.llvm.org/ce/z/VmvqSt

[SLP][X86] Add test coverage for #124993

4fdd28b

[NFC][ValueTracking] Hoist the matching of RHS constant (#125818)

8bba8a5

[TableGen][Docs] Fix productionlists for SimpleValue (#123751)

439de72

Previously the grammar tokens SimpleValue2 through SimpleValue9 were unreferenced. This ties them together so that the grammar makes more sense.

[LLD][COFF] Emit locally imported EC symbols for ARM64X (#125527)

8cb3d7b

[IR][NFC] Remove obsolete comments in BinaryOperator::swapOperands (#…

6c84d64

…125819) Closes #125438

[flang][cmake] Fix bcc dependencies (#125822)

f9af5c1

The Fortran libraries are not part of MLIR, so they should use target_link_libraries() rather than mlir_target_link_libraries(). This fixes an issue introduced in #120966.

[TableGen][Docs] Fix productionlists for assert and dump (#123739)

b275309

These were referring to nonexistent grammar tokens instead of `Value`.

[LLD][COFF] Use EC symbol table for output DEF file on ARM64X (#125531)

e596387

For consistency with input def handling.

[clang][bytecode] Handle CXXPseudoDestructorExprs (#125835)

ee25a85

Make lifetime management more explicit. We're only using this for CXXPseudoDestructorExprs for now but we need this to handle std::construct_at/placement-new after destructor calls later anyway.

krzysz00 and others added 29 commits February 6, 2025 11:38

[compiler-rt] Fixed Android 8.1 getauxval(AT_PAGESZ) crashes if ca…

6575154

…lled from `.preinit_array`. (#113427) (#116121) Signed-off-by: funsafe-ptr <[email protected]>

[VectorCombine] foldInsExtVectorToShuffle - ensure we call getShuffle…

eb2b453

…Cost with the input operand type, not the result Typo in #121216 Fixes #126085

[clang-linker-wrapper][lit] Fix SPIR-V OpenMP ELF test (#126098)

4717bab

Fix for lit fail from #125737 Signed-off-by: Sarnie, Nick <[email protected]>

[flang] Adding a couple of tests to the alias analysis (#125917)

8d5f280

To establish a baseline for new tests mentioned in #117785, adding them here independently.

[BOLT] Exit with error code on missing DWO CU (#125976)

b884be8

If BOLT fails to locate DWO CU when using split DWARF, this signifies an issue with the input (missing .dwo) rather than an internal assertion.

IndVarSimplify: improve a test, stripping undef (#126069)

34624d8

[libc++][NFC] Remove __default_allocator_type aliases (#126066)

1117568

These aliases are never used, so we can ditch them.

[X86] avx512-arith.ll - regenerate VPTERNLOG comments

5c3d146

[AMDGPU] Avoid repeated hash lookups (NFC) (#126001)

975bba6

[CodeGen] Avoid repeated hash lookups (NFC) (#126002)

91c188b

[X86] Avoid repeated hash lookups (NFC) (#126006)

ff94b03

[NFC][TableGen] Adopt Emitter::OptClass in InstrInfoEmitter (#125971)

337604e

- Use `Emitter::OptClass` to invoke `InstrInfoEmitter::run` and eliminate the `EmitInstrInfo` function.

[clang-linker-wrapper][lit] Temporarily disable OpenMP SPIR-V ELF test (

a183184

#126104) The fix requires more investigation, and it's a test issue so reverting the product changes should not be necessary. Signed-off-by: Sarnie, Nick <[email protected]>

[clang-doc] Precommit end-to-end test for --repository option (#122565

f8e53a9

) We test the `--repository` output in the unit tests, but that option fails to change the HTML output in the end-to-end tests. Upcoming patches will address the incorrect behavior.

[lldb][DWARFASTParser][NFC] Fix doxygen comment

c4d75b1

pull bot merged commit c4d75b1 into Ericsson:main Feb 6, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from llvm:main #5546

[pull] main from llvm:main #5546

pull bot commented Jan 16, 2025 •

edited

Loading

[pull] main from llvm:main #5546

[pull] main from llvm:main #5546

Conversation

pull bot commented Jan 16, 2025 • edited Loading

pull bot commented Jan 16, 2025 •

edited

Loading